Woodroofe ’ S One - Armed Bandit Problem Revisited
نویسندگان
چکیده
We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799–806], which involves sequential sampling from two populations: one whose characteristics are known, and one which depends on an unknown parameter and incorporates a covariate. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that involve suitable modifications of the myopic rule. It is shown that the regret, as well as the rate of sampling from the inferior population, can be finite or grow at various rates with the time horizon of the problem, depending on “local” properties of the covariate distribution. Proofs rely on martingale methods and information theoretic arguments.
منابع مشابه
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O( n 2 log 1 δ ) times to find an -optimal arm with probability of at least 1 − δ. This is in contrast to the naive bound of O( n 2 log n δ ). We derive another algorithm whose complexity depends on the specific setting of the rewards,...
متن کاملUCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
ABSTRACT. In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in Karmed bandits after T trials is bounded by const · K log(T ) , where measures the distance between a suboptimal arm an...
متن کاملCombinatorial Bandits Revisited
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret....
متن کاملBandit Problems
We survey the literature on multi-armed bandit models and their applications in economics. The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the trade-off between exploration (trying out each arm to find ...
متن کاملExploration and Exploitation Strategies for the K-armed Bandit Problem
In this paper, we study several different methods that can be used in the k-armed bandit problem. Each method is considered under the PAO (probably approximately optimal) framework. The various approaches are also compared empirically. One new approach, based on a Laplace estimator method, is introduced and shown to have good performance. 1. Statement of Problem and
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009